We use lmerTest to perform the linear mixed models. It allows us to test if there is a significant change whilst controlling for different patients.
# Libraries
library(ggplot2)
library(cowplot)
theme_set(theme_cowplot())
library(lmerTest)
library(plyr)
library(reshape2)
Here we define functions used to measure distance. Firstly, we define diff_state_dist which is our measure of divergence, the number genomic bins that are different between two samples divided by total number of bins are aberrant (!=2) in at least one sample.
The second function pga_diff_dist, measures the difference in PGA (as percentage points).
Here we process the clinical data and the CNA calls. We read in the clinical data and then the calls. We also then choose to saturate calls of losses and gain thus that 0 and 1 losses are all treated as a loss and that all gains (3, 4, 5 etc.) are treated as being simply ‘gained’. This means that a change from CN 2 to 5 is treated as being identical to 2 to 1 and 3 to 1, for instance.
We then calculate divergence for all comparisons of samples and record the organ, timepoint, type of sample (primary and metastasis) and block id for each.
Next, we calculate the mean divergence in each single time interval for each patient and we report the median value. This is provide a value for the general amount of divergence measured.
## [1] "Median mean timepoint divergence: 0.446"
Here we plot the divergences observed within each single timepoint, within a single organ, to represent divergence in space.
Here we plot the divergences observed within each single timepoint, but this time within different organs sampled at these timepoints, and this represents divergence in space between the samples across the body. There are very few examples of timepoints in which multiple organs were assessed. Generally in the cohort colorectal samples are sampled in the first timepoint and liver in latter timepoints.
Now we collapse this data and group the within timepoint, single organ divergences into primary and metastasis categories to investigate if metastatic deposits show a higher or lower level of within organ diversity than primary tumours. Here we can also see that we have much more data for liver metastases.
Of course, to correctly investigate this we need to perform some statistics. Therefore here we create a mixed model (lmer) of Divergence ~ Metastasis where we use each patient as a random effect to control for the fact that patients may show different total levels of divergence.
## [1] "Mixed model (1) - Primary and Metastasis using patients as random intercepts"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ Type_i + (1 | Patient)
## Data: within_prim_mets
##
## REML criterion at convergence: -274.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9544 -0.5994 0.0348 0.6337 3.2688
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.02009 0.1417
## Residual 0.02183 0.1478
## Number of obs: 334, groups: Patient, 22
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.27857 0.04427 65.94358 6.293 2.87e-08 ***
## Type_iMetastasis 0.03917 0.03615 331.69030 1.083 0.279
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Type_Mtstss -0.656
## [1] 0.2699961
## [1] "Repeat of mixed model (1) but for PGA difference"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pdiff ~ Type_i + (1 | Patient)
## Data: within_prim_mets
##
## REML criterion at convergence: -564.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8739 -0.5932 -0.1060 0.4776 3.2093
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.00425 0.06519
## Residual 0.00942 0.09706
## Number of obs: 334, groups: Patient, 22
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 9.362e-02 2.538e-02 1.178e+02 3.689 0.000343 ***
## Type_iMetastasis 9.851e-03 2.335e-02 3.318e+02 0.422 0.673310
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Type_Mtstss -0.762
Now we would like to ask if there is a significant difference between divergence within liver and colorectal samples from the same patient and between them. For this we do not filter by timepoint. As we can see there are far fewer colorectal-colorectal comparisons compared to anything to the liver samples.
So now we ask if any of the categories have a significantly different mean using a simple one-way ANOVA (aov).
## [1] "An ANOVA for comparing divergence in colorectal and liver organs..."
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ_Compare 2 0.43 0.21496 4.731 0.00913 **
## Residuals 637 28.94 0.04543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "An ANOVA for comparing difference in PGA in colorectal and liver organs..."
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ_Compare 2 0.006 0.00316 0.193 0.825
## Residuals 637 10.453 0.01641
Now we want to know if divergence comparisons within timepoints are different from those not in the same timepoint. We now plot this simple category.
To assess this we simply ask what is the effect on divergence if a comparison is made within the same timepoint or not. We again do a mixed model (lmer) using the patient as a random effect.
## [1] "Mixed model (2) - General assessment of whether within timepoint and across timepoint is different"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ Same_tp + (1 | Patient)
## Data: within_tps_across_tps
##
## REML criterion at convergence: -575.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.06957 -0.65190 0.08715 0.71782 2.45416
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.01912 0.1383
## Residual 0.02383 0.1544
## Number of obs: 721, groups: Patient, 22
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.40189 0.03147 23.24270 12.771 5.43e-12 ***
## Same_tpTRUE -0.07556 0.01192 704.18895 -6.341 4.07e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Same_tpTRUE -0.184
## [1] "Repeat of mixed model (2) but for PGA difference"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pdiff ~ Same_tp + (1 | Patient)
## Data: within_tps_across_tps
##
## REML criterion at convergence: -1053.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3485 -0.6916 -0.1002 0.5236 3.0628
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.005582 0.07471
## Residual 0.012444 0.11155
## Number of obs: 721, groups: Patient, 22
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.13456 0.01773 23.85885 7.588 8.26e-08 ***
## Same_tpTRUE -0.03143 0.00860 707.17480 -3.655 0.000276 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Same_tpTRUE -0.237
Now we are interested in those divergences between single intervals across the patients both as a total value and also normalised by time (divergence month-1).
Now we are interested in whether treatment affects these intervals, so we basically collapse the previous plot and plot divergence both normalised and not normalised by time in the off- and on- treatment categories.
Here we perform a mixed model to test if absolute differences in divergence are different between treatments, using patients as a random effect.
## [1] "Mixed model (3) - Divergence NOT normalised by time in treated and non-treated intervals using patients as random intercepts"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dist ~ Treatment_prior_recent_tp + (1 | Patient)
## Data: patient_dists_next_tp
##
## REML criterion at convergence: -219.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.27587 -0.69514 0.05507 0.71404 2.22127
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.02271 0.1507
## Residual 0.02167 0.1472
## Number of obs: 280, groups: Patient, 21
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 0.402022 0.039065 29.352775 10.291
## Treatment_prior_recent_tpTRUE -0.008758 0.030071 254.033221 -0.291
## Pr(>|t|)
## (Intercept) 2.99e-11 ***
## Treatment_prior_recent_tpTRUE 0.771
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Trtm___TRUE -0.447
## [1] "Repeat of mixed model (3) but for PGA difference"
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pdiff ~ Treatment_prior_recent_tp + (1 | Patient)
## Data: patient_dists_next_tp
##
## REML criterion at convergence: -389.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4419 -0.5835 -0.0925 0.3880 3.1436
##
## Random effects:
## Groups Name Variance Std.Dev.
## Patient (Intercept) 0.008195 0.09052
## Residual 0.012037 0.10971
## Number of obs: 280, groups: Patient, 21
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 0.130503 0.024930 30.457223 5.235
## Treatment_prior_recent_tpTRUE -0.006681 0.021674 218.572965 -0.308
## Pr(>|t|)
## (Intercept) 1.15e-05 ***
## Treatment_prior_recent_tpTRUE 0.758
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Trtm___TRUE -0.501
Here we process the data to calculate percentage of the genome aberrated (PGA) and to compare from one timepoint to the next whether this goes up or down and record the difference. Here is PGA in one sample is 0.41 and in the next it is 0.43, we would report it as going up by 0.02 * 100 (2). We average this for each timepoint and report the median of the averages across the cohort.
## [1] "Median mean increase in PGA (+ / -): 11.3"
## [1] "Percentage points change: 6.5"